Close
dtSearch Engine API for Java
Options.setUnicodeFilterRanges Method

Specifies the Unicode subranges that the filtering algorithm should look for.

Syntax
Java
public void setUnicodeFilterRanges(boolean[] unicodeFilterRanges);

For example, if UnicodeFilterRanges is set to 1 and 8, then the filtering algorithm will look for characters from U+0100-U+01FF and U+0800-U+08FF 

This is used to help the filtering algorithm to distinguish text from non-text data. It is only used as a hint in the algorithm, so if the text extraction algorithm detects text in another language with a sufficient level of confidence, it will return that text even if the language was not selected. 

Each boolean value corresponds to one 256-character subrange, so an array with the first and second boolean value set would specify the ranges from U+0000 through U+01FF.